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Abstract 

Respondent-Driven Sampling is a popular technique for sampling hidden pop¬ 
ulations. This paper models Respondent-Driven Sampling as a Markov process 
indexed by a tree. Our main results show that the Volz-Heckathom estimator is 
asymptotically normal below a critical threshold. The key technical difficulties 
stem from (i) the dependence between samples and (ii) the tree structure which 
characterizes the dependence. The theorems allow the growth rate of the tree to 
exceed one and suggest that this growth rate should not be too large. To illustrate 
the usefulness of these results beyond their obvious use, an example shows that in 
certain cases the sample average is preferable to inverse probability weighting. We 
provide a test statistic to distinguish between these two cases. 


1 Introduction 

Classical sampling requires a sampling frame, a list of individuals in the target popula¬ 
tion with a method to contact each individual (e.g. a phone number). For many popu¬ 
lations, constructing a sampling frame is infeasible. Network driven sampling enables 
researchers to access populations of people, webpages, and proteins that are otherwise 
difficult to reach. These techniques go by many names; web crawling, Respondent- 
Driven Sampling, breadth-first search, snowball sampling, co-immunoprecipitation, 
and chromatin immunoprecipitation. In each application, the only way to reach the 
population of interest is by asking participants to refer friends. 

Respondent-Driven Sampling (RDS) serves as a motivating example for this pa¬ 
per. The Centers for Disease Control, the World Health Organization, and the Joint 
United Nations Programme on HIV/AIDS have invested in RDS to reach marginalized 
and hard-to-reach populations [Heckathorn, 1997] | WHO, 2013[ . Each individual i in 
the population has a corresponding feature yi (e.g. yi S {0,1} and yi — 1 if * is 
HIV-h). Using only the sampled individuals, we wish to make inferences about the av¬ 
erage value of yi across the entire population, denoted as p (e.g. the proportion of the 
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population that is HIV+). Extensive previous statistical research has proposed various 
estimators of p, which are approximately unbiased based upon various types of mod¬ 
els for an RDS sample (Salganik and Heckathom, 2004 1 Volz and Heckathorn, 2008] 
|Gile, 201T| . We note that in the papers cited above (except [Gile, 20fT| ), RDS is as¬ 
sumed to sample with replacement. Previous research has also explored the variance of 
these estimators |Goel and Salganik, 2009 |Rohe, 2015) . This paper studies the asymp¬ 
totic distribution of statistics related to these estimators. 

Results on asymptotic distributions for RDS are useful for two obvious reasons. 
First, they allow us to construct asymptotic confidence intervals for p. Second, they 
provide essential tools to test various statistical hypotheses. The only central limit 
theorem associated considered in the RDS literature studied the case when the tree 
indexed process reduces to a Markov chain | Goel and Salganik, 2009| ; this presumes 
that each individual refers exactly one person. Previous research suggests that the 
number of referrals from each individual is fundamental in determining the variance of 
common estimators [Rohe, 2015| . This paper establishes two central limit theorems in 
settings which allow for multiple referrals. 

The main results apply to both the sample average and the Volz-Heckathorn esti¬ 
mator, which is an approximation of the inverse probability weighted estimator (cf Re- 
mark[^. Because the inverse probability weighted (IPW) estimator and its extensions 
are asymptotically unbiased, these estimators are often preferred to the sample average. 
However, sometimes survey weights are not needed and they only introduce additional 
variance to the estimator [Bollen et al., 2016) . This issue is particularly salient when 
sampling weights are highly heterogeneous, as is often the case in RDS. Proposition 
shows that if the outcomes yi are uncorrelated with the sampling weights, then the 
sample average is unbiased. Theoremj^extends this result to RDS to show that the IPW 
estimator can have a larger variance than the sample average. Taken together, these re¬ 
sults imply that the sample average can have a lower mean squared error (MSE) than 
the IPW estimator. Section|^introduces an estimator of the bias of the sample average. 
The main results provide a path to test the null hypothesis that the bias is zero. This 
can be used to select between the sample average and the IPW estimator. Section [6^ 
studies this routine with the AddHealth social network. 


2 Notation 


Following I Goel and Salganik, 2009| and [Rohe, 2015| , the results below model the 
network sampling mechanism as a tree indexed Markov process on a graph. There 
are many assumptions in this model which are incorrect in practice. However, like 
the i.i.d assumption, it allows for tractable calculations. In the simulations, we show 
that the theory derived from this model provides a good approximation for a more 
realistic sampling model. [Lu et ah, 2012| studies the sensitivities of the estimators to 
this model. 

Let G = {V, E) be a finite, undirected, and simple graph with vertex set V = 
{1,..., A^} and edge set E. V contains the individuals in the population and E describes 
how they are related to one another. As discussed in the introduction, y : V —?■ U is 
a fixed real-valued function on the state space V ; these are the node features that are 
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measured on the sampled nodes. The target of RDS is to estimate /r = N~^ X^ilr 2/(0- 

If each sampled node referred exactly one friend, then the Markov sampling pro¬ 
cedure would be a Markov chain. Several classical central limit theorems exist for this 
model; see [Jones et al., 2004| for a review. The results herein allow for each sampled 
node to refer more than one node. This is a Markov process indexed not by a chain, 
but rather by a tree. Denote the referral tree as T. Where the node set of G indexes the 
population, the node set of T indexes the samples. That is, we observe a subset of the 
individuals in G with the sample {Wi-jT-gT C V. An edge {a, r) in the referral tree 
denotes that sampled individual referred individual Xt into the sample. Mathe¬ 
matically, T is a rooted tree-a connected graph with n nodes, no cycles, and a vertex 0 
which indexes the seed node. To simplify notation, cr € T is used synonymously with 
tr belonging to the vertex set of T. 

For each non-root node r G T, denote p(r) G T as the parent of r (i.e. the node 
one step closer to the root). This paper presumes that {Xt}tS:T is a tree-indexed ran¬ 
dom walk on G, which was a model introduced by [Benjamini and Peres, 1994 1. This 
model generalizes a Markov chain on G; each transition Wp(T-) —> Xt is an indepen¬ 
dent and identically distributed Markov transition with transition matrix P. Following 
[Benjamini and Peres, 1994] , we will call this process a (T, P)-walk on G. Unless 
stated otherwise, it will be presumed throughout that the root node of the random walk 
Xq is initialized from the equilibrium distribution of P. It follows that X^ has distri¬ 
bution TT for all cr G T. 

Unless stated otherwise, the results in this paper allow for the transition matrix 
P to be constructed from a weighted graph G. Let Wij be the weight of the edge 
{i,j) £ E!', if {i,j) ^ E, define Wij = 0. If the graph is unweighted, then let Wij = 1 
for all {i,j) G E. Define the degree of node i as deg{i) = the graph is 

unweighted, then deg{i) is the number of connections to node i. Throughout this paper, 
the graph is undirected. So, Wij = wji for all pairs i,j. Given that {Xp(^T) = 0’ the 
probability of {Xt = j} is proportional to Wij\ 


P{Xt=3\Xp(t)=i) = 


Wi 


deg{i)' 


We use the term simple random walk for the Markov chain constructed on the un¬ 
weighted graph (i.e. Wij G {0,1} for all i,j). The simple random walk presumes that 
each participant selects a friend uniformly and independently at random from their list 
of friends. 

In order to estimate p, we observe y{XT) for all t G T. Because G is undirected, P 
is reversible and has stationary distribution tt with tt^ c>c deg{i) for all i G G', this fact 
is helpful for creating an asymptotically unbiased estimator for /i, particularly under 
the simple random walk assumption [Volz and Heckathorn, 200^ . 

Remark 1. In general, the quantity of interest fi = N~^ 2/(*) ”'^2 equal to 

Ai such, the sample average of y{XT)’s is a biased estimator for y. With 
inverse probability weighting, define a new function y'(i) = y{i){N'Ki)~^ and the 
respective estimator 


ftipw = 


n 


\ yj^cr) 

n Nttx ’ 

ctGT 
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where n = |T| is the sample size. Then, Ett^pipw) = £' 7 r( 2/0 = T- such, the 
sample average of the y'{Xr)’s is an unbiased estimator of p,. Unfortunately, the 
values TTi are unknown. In practice, RDS participants are asked various questions to 
measure how many friends they have in G. Under the simple random walk assumption, 
TTi is proportional to the number of friends of i; this result also requires that the edges 
in G are undirected, something that will be presumed throughout the paper. Therefore 
the Volz-Heckathom estimator 


'sr^ v{^<y)/deg{Xa) 

is in essence a Hdjek estimator based upon deg{i) ^Volz and Heckathorn, 2008^ . Un¬ 
der the simple random walk assumption, this estimator provides an asymptotically un¬ 
biased estimator of p. 


For each node r G T, let |r| be the distance of the node from the root; this is 
also called the “wave” of t. For every pair of node cr, r S T, define d(o-, r) to be 
the distance between a and t on T (as a graph). For each non-leaf node a G T, let 
rj{<j) be the number of offspring of cr. A tree is said to be an m-tree of height h if 
rj{<j) = m for all cr S T with \a\ < h and 77 ( 17 ) = 0 for all |cr| = h. Here, both 
m and h are a natural numbers (i.e. m,h G N). T is said to be Gabon-Watson if 
77 (cr) are i.i.d random variables in N. While the theorems below only study 2-trees; the 
computational experiments in Section [ 6 T| suggest that the conclusions of the analytical 
results are highly robust to replacing the 2-tree with a Gabon-Watson tree. 

ILevin et al., 2009[ serves as this paper’s key reference for Markov processes. Fol¬ 
lowing the notation in that text, define ET^{y) = varT^{y) = Ej^iy — 

for the function y. 

There are two primary concerns about the model and estimator used in the main 
results below. First, the Markov model allows for resampling. Second, the results 


below only apply to rTz-trees, not more general trees. The simulations in Section 6.1 


suggest that the analytic results continue to hold under a more realistic setting that 
addresses both of these concerns. 


3 Main Results 


The threshold m < was previously identified in [Rohe, 2015|| as being a critical 
threshold for the design effect of network driven sampling; beyond this threshold, the 
variance of the standard estimator does not decay at the standard rate. In other words. 


var{ 


1 

a/Ict G T : |ct| < /i| 


crGT:|fT|</i 


—)• 00 


SLS h ^ 00 . As such, using the traditional scaling, no central limit theorem holds above 
the critical threshold. Because of this, the t heorems focus on the case m < Xf^. When 
m > A 2 the simulations in Section 
not hold for any scaling. 


6.1 


suggest that the central limit theorem does 
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Theorem [T] is a central limit theorem for an estimator constructed from the tree- 
indexed Markov chain. The theorem holds for any function y, any reversible transition 
matrix with second largest eigenvalue satisfying IA 2 I 7 ^ 1 , and any m < 

Theorem 1. Suppose that P is a reversible transition matrix with respect to the equi¬ 
librium distribution tt, and that the eigenvalues of P are 1 = Ai > IA 2 I > ... > |Ajv|. 
Without loss of generality, suppose that E^{y) = 0. Define 


\/ml ‘ ^ 


TGT:|r|=i 


IfT is an m-tree with m < X 2 , then 


1 

^ ^ i—1 


in distribution, where ctq = vavT^ify/mP — I) ^y) — varT^{P[y/niP — I) ^y). 

The sequence of random variables considered in Theorem 1 are not exactly sample 
averages, but a reweighted form of sample average. Samples in the same wave are 
equally weighted, while samples from different waves are not. The following theorem 
provides theoretical guarantee on the distribution of sample average for a specihc class 
of transition matrix and node feature. 


Theorem 2. Let T be a 2—tree. Without loss of generality, suppose that £' 7 r(y) = 0. 
Define ph = ^ J2aGT,\a\<h Suppose that 

(cl) = 0/or all h,kG N; 

(c2) for any function f onV satisfying E^f = 0, ||P/||oo < IA 2 III/II 00 ; 

fcijIA 2 1 < 

then 

Ph N(^0, (Tg) 

in distribution for some ctq. 

Remark 2. Condition (cl) is a technical condition on the symmetry of pt that is nec¬ 
essary in the proof The following proposition provides a sufficient condition for (cl). 

Proposition 1. Suppose that y is symmetric, i.e. for any i € V there exists j such 
that y(j) = —y{i). Ifp{u,v) = P{y{X^) = v\y{Xp(^cr)) = u) is well-defined and 
p(u, v) = p{—u, —v) for all u,v € y(V), then condition (cl) is satisfied. 

Proof. Under the conditions of the proposition, the distribution of jlh is symmetric 
with respect to 0. Thus = 0 for all /i, fc € N. □ 

Conditions (c2)-(c3) can be substituted by the following condition (c2’): 

(c2’) There exists c < -^ such that for any function f on V satisfying E^^f = 0, 
||P/||oo<c||/|U. 
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Condition (c2’) is weaker than (c2) and (c3) combined, but is stronger than (c3) 
alone. To see this, let f be the eigenfunction of the second eigenvalue, and it follows 
that IA 2 I < can be easily seen that one necessary condition for (c2’) is that 

~ ^j\ < 

for all i G V. In other words, all the rows of P must be close to tt. As previously 
discussed, condition (c3) is actually a necessary condition for the central limit theorem 
hRohe, 20151 in the sense that the variance of fih tends to infinity i/|A 2 | > 

For clarity in the exposition of the theorem and the proof, we have only proved 
the theorem for the 2-tree. We believe that similar results are likely to hold for more 
general m-trees. 

3.1 Extension to the Volz-Heckathorn estimator 

When P is restricted to be the transition matrix of the simple random walk on G, the 
following corollary shows that Theorem can be extended to the Volz-Heckathorn 
estimator [Volz and Heckathorn, 2008~| . 

Denote d = ^ *^he average node degree. Following Remarkj^ the 

IPW estimator contains l/(iV7ri) which is equal to d/deg{i). The Volz-Heckathorn es¬ 
timator hrst estimates d with the harmonic mean of the observed degrees. Because this 
harmonic mean converges to d in probability, the following corollary applies Slutsky’s 
Theorem to give a central limit theorem for the Volz-Heckathorn estimator. 

Corollary 1. Let T be a 2-tree. Suppose in particular that P is the transition matrix of 
the simple random walk on G. Define a new node feature y'{i) = y(i) j deg{i). Without 
loss of generality, suppose that Ej^y' = 0 (this is not equivalent to E^^y = OJ. Define 

jlhyH = jdhd = ^ y'{Xa)d, 


where 


d = 


2h+i _ I 


ScreT.|cr|</i ^/deg(XT^) 


If the new node feature y' and the transition matrix P satisfy conditions (cl)-(c3) in 
Theorem^ then 


g-h,VH ^O.Vh) 


in distribution for some CTq y^j. 


3.2 Illustrating the conditions with a hlockmodel 

Consider G as coming from a hlockmodel with two blocks [Lorrain and White, 1971) . 
In this hlockmodel, each node i is given a label z{i) G {1,2} and every edge weight 
Wij = Bz{i)yj) for some symmetric 2x2 matrix B. Suppose that yi = yj if z{i) = 
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z{j). This model was previously studied in | Goel and Salganik, 2009| and it serves as 
an approximation to the Stochastic Blockmodel. 

Given the structural equivalence of nodes within the same block, it is sufficient to 
study the transition matrix between blocks, 3^ G If is a symmetric matrix 

with entries pn = P22 = P and pi 2 = P21 = 1 — p for some value p, then it can be 
easily verified that condition (cl) is satished. Moreover, if 2p—1 < ^, then conditions 
(c2) and (c3) are also satisfied. Our theorem asserts that the Volz-Heckathorn estimator 
converges to the normal distribution in this model. 

More generally, suppose that the nodes in the blockmodel for G are equally bal¬ 
anced between 2K blocks with node features {j/i, —yi ,... ,yK, —yx} and that the 
transition matrix p{u,v) = pl(u = u) + ^ verify that all the 

1 

2 ^ 2 " 


conditions are satished as long as < p < ^ 


2K 


4 Comparing the variance of inverse probability weight¬ 
ing to the bias of the sample average 


An estimator with a small mean square error (i.e. E{p — p,)^) has a small bias and 
a small variance. It is generally known that inverse probability weighting provides 
an unbiased estimator of p. However, survey weights can also drastically inhate the 
variance of the estimator. This matter has been heavily studied by survey statisti¬ 
cians and substantial literature have devoted to the methodologies and issues regard¬ 
ing the use of sampling weights; see ||Pfeffermann, 19961 , |Biemer and Christ, 2007 1 , 
QValliant et ah, 20131 , and |Bollen et ah, 201^ for a review. To determine whether one 
should use sampling weights in RDS, this section gives a test statistic for the null hy¬ 
pothesis that the sample average is unbiased. The results in the previous section suggest 
a confidence region for this test statistic. 

Denote n = |T| as the number of samples. The next results compare pipw to the 
sample average 

rGT 

Proposition and Theorem highlight the dangers of inverse probability weighting 
by showing that it can increase the variance. Proposition [^studies the simplified case 
where the samples are i.i.d from the stationary distribution. Following the proposition. 
Theorem [^studies the more relevant setting of the (T, P)-walk on G. To simplify the 
statements of Propositionand Theorem]^ and their proofs, the node features y{i) 
are presumed to be random variables. This condition could be removed with further 
technical conditions on the moments of p : V —K and its relationship to tt. 


Proposition 2. Suppose that Xi, are sampled independently from the station¬ 

ary distribution tt and that p(l),..., p(iV) are N uncorrelated and identically dis¬ 
tributed random variables with finite second moment p 2 - Ci = maXKKisiN'Ki 
and var{Tr) = '^i ~ 


var{pipw) 


var{fi) > P 2 N{ 


N 


l)var{'K). 


( 1 ) 
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Thus, as long as N > n, which can be easily satisfied in practice, 


var{fiipw) > var{p). 

Theorem 3. Suppose that {Xr : r G T} /s a sample from the (T, P)-walk on G, and 
that y(l), ...,y{N) are N uncorrelated and identically distributed random variables 
with finite second moment p 2 - Assume that there exist constants Ci,C 2 and C 3 (not 
the same constants as in Proposition^ such that CiN < di < C 2 N for all i and 
N‘^var('K) > C 3 . Then 


var{pipw) - var{p) > p 2 { — ^var(TT) - 

TT-Oo 


C2 

NCf 


), 


( 2 ) 


and there exists C independent of n such that var((lipw) > var{p) as long as N > 
Cn. 

Proposition 1^ shows that as var{Tr) increases, the difference between the variance 
of fiipw and p becomes larger. Similarly, Theorem]^ also shows that the difference 
between the variance of pipw and p increases as var(7r) increases, given that the 
relative upper and lower bound of d (i.e. Ci and C 2 ) remain hxed. Recall that when 
P is the simple random walk, the probabilities tt^ are proportional to node degree. An 
extensive literature (e.g. | Strogatz, 2001 [Clauset et ah, 2009) ) has found that empirical 
networks have highly heterogeneous node degrees. As such. Equation [T] shows that the 
variance of pipw can be dramatically greater than the variance of p. Moreover, both 
Proposition|^and Theorem|^suggest that the variance difference var(pipw) — var{p) 
can be considerable if we sample only a small proportion of the whole population. This 
problem is particularly salient when the population is large. 

The bias-variance tradeoff presents a dilemma between inverse probability weight¬ 
ing and the sample average. This bias can be estimated. For every i S E, dehne a new 
node feature, 

y'(i) =y(i)(l- =y(i){l- (3) 


N-k,, 


deg{i )' 


Proposition 3. The mean of the new node feature satisfies 

K{y') = E^iy) - p, 


which is the true bias of the sample average. Therefore, under the null hypothesis. 


Ho : E^(y') = 0, 


the sample average is an unbiased estimator. Ifir and y are uncorrelated (i.e.^^^y ’^iyi't) 
W J/(*)A Eq is satisfied and the sample average is unbiased. 

Under the conditions of Proposition]^ tt and y satisfy the condition of Proposition 
[^in expectation (i.e. they are uncorrelated in expectation). Both of these conditions 
imply that the outcome is unrelated to the sampling weight (in some way). Under 
such conditions, both estimators are unbiased. If it is also true that var( 7 r) is large, 
then p has a smaller variance. In this scenario, p is preferable to pipw in terms of 
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MSE. However, if the sample average is biased, then we must compare the bias and 
variance of the two estimators. Asymptotically, the variance of both estimators will 
vanish, while the bias stays constant. So, for sufficiently large sample size, one should 
use fiipw- For smaller sample sizes, the bias of the sample average could be small 
(relative to the difference in the variances). In such settings, there will be a crossover 
point, a sample size at which (iipw becomes preferable to jl. To distinguish between 
these two cases, we want to test the null hypothesis that the sample average is unbiased, 
i.e. Ejriy') = 0. Or, more generally, we want to provide a confidence region for the 
bias of the sample average. 

If d is unknown, as is generally the case, we can estimate d by the harmonic mean 
I Salganik and Heckathom, 2004| 


j^T-GT ^/deg{Xr) 


Substituting d for d i n Equation yields the new node feature based on the Volz- 
Heckathorn estimator QVolz and H^kathorn, 20081 


y'vHii) = ?/(*)( 1 


d 

deg{i) 


)• 


Similarly, define 


bias = - X! = - X! 1 “ 

n n \ 

CTGT £TGT \ 


deg{Xr 


= fi — yvH, 


(4) 


then bias = (1 — jivH is an asymptotically unbiased estimator for the bias of fi. It 
serves as a test statistic for the null hypothesis Hq : ETr{y') = 0. 

The theorems above suggest that bias converges to the normal distribution. The 
rejection region is then 

W = {|&zas| > 1.96-^}. 
where ctq is an estimate of the variance. 


5 Estimating the variance 

Eor some node feature y (e.g. HIV status y or the y' in Equation ([^ that motivate bias), 
let jl denote the sample average. Denote cr? as Varj^pjjl), where the subscript T, P 
denotes that the data is collected via a (T, P)-walk on G. This subsection studies a 
simple plug-in estimator for cr?. 

The following function is essential to expressing a [Rohe, 2015| . 

Definition 1. Select two nodes I, J uniformly at random from the tree T. Define the 
random variable D = d{I, J) to be the graph distance in T between I and J. Define 
G as the probability generating function for D, 

G{z)=E{z^). 
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In practice, T is observed. So, the function G can be computed. In many studies 
there are multiple seed nodes. In these cases, we suggest computing d{I, J) on a tree 
which has an artificial root node that connects to all of the seeds; this root node could 
be imagined as an individual that is responsible for finding the seed nodes. In this tree, 
two different seed nodes would be distance 2 apart. 

Denote the autocorrelation at lag 1 of {/{Xt) as 

Cou(y(Xp(^)),y(X^)) 

var^ijj) 

Both Coz;(j/(Xp(T-)),^(X t-)) and var^^ can be estimated with plug-in quantities. Be¬ 
cause the data has been sampled proportional to tt, the plug-in quantity for should 
not explicitly adjust for tt. 

var^{y) = - V(y(X^) - ^yhY- 
n 

rGT 


Similarly, 

f7o'u(y(Xp(.,-)),y(x.,-)) ^ ^ ^ {y{^p{T)^ dvH^{y{Xr) yvH^i 

rGT\0 


where {T \ 0} contains all nodes except the root node 0 (because p{0) does not exist). 
Using these plug-in quantities, define R. Then, the estimator is 


IT? = G{R)var^{y). 

A popular bootstrap technique for estimating (t? resamples y{XT) as a Markov 
process (i.e. in addition to Xj- being a Markov process, the bootstrap procedure also 
assumes that y{Xr) is Markov) | Salganik, 2006| . This model is akin to the block- 
model with two blocks in Section [3.2[ The following assumption is weaker than this 
assumption. 

Assumption 1: y{i) = y -\- cr/(i), where /r, cr G K and f : V — )■ K is an eigen¬ 
function of P with tiar^(/) = 1. 


Proposition 4. Under Assumption 1, 


cr~ = G{R)var^{y). 


While Assumption 1 is weaker than the previous assumption in | Salganik, 2006| , 
the next proposition highlights the danger of this assumption. It uses a different as¬ 
sumption which is a rather weak assumption. 

Assumption 2: G is convex on [Amin, 1], where Amin is the smallest eigenvalue of 
P. 


Because G is a probability generating function, it is always convex on [0,1]. As 
such, we on need to be worried about negative values. Recall, that the central limit 
theorems above only hold when |Amin| < 1/v^ « .7 (the smallest possible value 
for Amin is —1). Some simulated trees given in the appendix suggest that if G is not 
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convex, it often fails in the neighborhood of —1. As such, the assumption that lAminl < 
1/^2 « .7 is likely to imply Assumption 2. In practice, one observes the referral tree 
T. Thus, one can compute the second derivative of G. Eigenvalues of P close to 
negative one arise in antithetic sampling, where adjacent samples are dissimilar. For 
example, if the population in G was heterosexuals and edges in G represent sexual 
contact, then men would only refer women and vice versa. In this case, Amin would be 
exactly —1. While easily imagined, such settings are not current practice for RDS. As 
such, large an negative values are uncommon; Amin is likely close to zero. 

The following proposition follows from an application of Jensen’s inequality. 

Proposition 5. Under Assumption 2, 


cr~ > G{R)var^{y). 


Because Assumption 2 is not very restrictive, the inequality in Propos ition [^high- 
lights the danger in breaking Assumption 1 (and thus the Markov model in [ Salgmik, 2006| ); 
breaking Assumption 1 will lead to ct? underestimating the variance. 

5.1 A bias adjusted estimator for /r 

The test above also allows us to derive a more robust estimator of p. Define 



Then bias = p — pipw- Using the hypothesis test to choose between the sample aver¬ 
age and inverse probability weighting is akin to hard thresholding the bias adjustment. 
Define 



bias if reject Hq 
0 otherwise. 


The final estimator of p is then ppA = A ~ biast (BA for bias adjustment). This 
estimator is explored in Section [6^ 

6 Numerical results 

6.1 Simulation 

In this section we illustrate the theoretical results on simulated data. The simulations 
are performed on networks simulated from the Stochastic Blockmodel ||Holland et al., 19^ . 
The four colors in Figure 1 correspond to four different networks that are simulated 
from four different Stochastic Blockmodels. Each of the four networks has N =5,000 
nodes, equally balanced between group zero and group one. The probability of a 
connection between two nodes in different blocks is r and the probability of connec¬ 
tion between two nodes in the same block is p. To control the eigenvalues of the 
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5000 X 5000 transition matrix, consider the transition matrix between classes given by 
^ = E{D)~^E{A). The second eigenvalue of is QRohe et al., 201 1| 


A2(^) 


p — r 
p + r 


where expectations are under the Stochastic Blockmodel. In our simulation, the second 
eigenvalue of the actual transition matrix is typically very close to X 2 {^)- We take 
p + r = 0.01 in all four Stochastic Blockmodels so that the average degree is about 25. 
As such, A 2 (.^) is actually controlled by p — r. 

For each of the four networks we carry out four different sampling designs. Let 
T be either a 2—tree or a Gabon-Watson tree with E{rj{a)) = 2. For the Galton- 
Watson tree, the distribution of 77 ( 17 ) is uniform on {1,2, 3}. For each T, we consider 
both with replacement sampling (i.e. the (T, P)-walk on G) and without replacement 
sampling (i.e. referrals are sampled uniformly from the friends that have not yet been 
sampled). Note that the conditions of Theorem may be violated when either the 
Gabon-Watson tree or without-replacement sampling is used. We take the hrst 8 waves 
of T as our sample. As such, the sample size is roughly A^/10. For each social network 
and sampling design, we repeat the sampling process 2000 times and compute jl = 
A Yl'i=i y{^i) for each sample. The Quantile-Quantile (Q-Q) plot of jl is shown in the 
left panel of Figure[T] note that the QQ plot centers and scales each distribution to have 
mean zero and standard deviation one. In addition, we repeat the above simulation for 
the Volz-Heckathom estimator, and the QQ plot of jlvH is shown in the right panel of 
Figure [T] 

It is clear from Figure that there are two patterns of distribution: when A 2 < 
1/^777 ~ 0.7, i.e. A2=0.5 or 0.6, the Q-Q plots of jl and jlyn approximately lie on the 
line y = X for all sampling design; when A 2 > Ij \fm « 0.7, i.e. A2=0.8 or 0.9, the 
Q-Q plot of jx and jxvH departs from the line y = x. Taken together. Figure [T] suggests 
that the distribution of jl and jlvH converges to Gaussian distribution if and only if 
m < A^^. Actually, the right panel of Figure implies that there are two modes in 
the asymptotic distribution of jl and jlvH when m > A^^. The relationship between 
the expectation of the offspring distribution and the second eigenvalue of the social 
network determines the asymptotic distribution of RDS estimators, regardless of the 
node feature, the particular structure of the tree or the way we handle replacement. 


6.2 Analysis of Adolescent Health data 

To illustrate Theorem [T] with the test statistic in Section 3, this section presents simu¬ 
lation results that use the Comm 15 friendship network from the National Longitudinal 
Survey of Adolescent Health. This simulation compares the MSB of jl and jxjpw for 
two different node features y. When y is correlated with tt, then jiipw has a smaller 
MSB. When y is weakly correlated with tt, then jl has a smaller MSB. Bor settings in 
which jljpw clearly outperforms jl, the test statistic from Section 3 rejects the null 
hypothesis that the sample average is unbiased (i.e. Hq : E^^jl = /i). 

In the Comml5 network, N = 1089 students from two sister schools were asked to 
list up to 10 friends; these friends can be inside or outside of the school. The students 
also supplied information including their gender, grade and race. The analysis below 
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Sample Average 


Volz-Heckathom Estimator 




Figure 1: Q-Q plots of the sample average (left panel) and the Volz-Heckathorn esti¬ 
mator (right panel) for different social network and sampling designs. For each sce¬ 
nario we draw 2000 network driven samples of size « 500 from a network contain¬ 
ing 5,000 nodes. Here the threshold for A 2 is 1/v^ « 0.707. For the two settings 
with A 2 < 1/the distributions appear normal. However, for the two settings with 
A 2 > 1/v^, the distributions do not appear normal. Across all values of A 2 , there is 
no apparent difference between the four different designs (i.e. replacement sampling 
vs without replacement sampling and 2-tree vs Gabon-Watson tree). 


studies two node covariates; gender (0/1) and total nominations of friends (integer 
between 0 and 10). Before the simulation, the network was symmetrized (i.e. consider 
the new adjacency matrix A = A + A^), yielding a network with average node degree 
d = 8.06. Because the students were only allowed to list up to 10 friends, the standard 
deviation of the degrees is 4.7. This is drastically smaller than typical social networks. 
However, even in this setting, the variance of tt is sufficiently large to illustrate the 
advantages of the sample average. 

For both gender and the number of nominations. Table 1 displays (i) the correlation 
between tt and y, (ii) the bias of the sample average, and (iii) the crossover point. Recall 
that the crossover point is the sample size at which the jlipw has a smaller MSE than 
A, this calculation is based upon the simulations described below. The table shows that 
gender is weakly correlated with tt. As such, the sample average has a small bias and 
the crossover point is large. Contrast this with the number of total nominations, which 
is highly correlated with tt. This makes the sample average clearly biased. Because of 
this, it has a small crossover point. These two examples illustrate a range of possibilities 
in terms of cor{Tr, y). 

Before estimating the crossover points shown in Table 1, we first study the hy¬ 
pothesis test Hq : Efj, — y for both gender and total nominations. To provide a 
benchmark, this simulation compares RDS to independent sampling. Let P be the 
transition matrix of the simple random walk on the network. The second largest eigen¬ 
value of P is A 2 = 0.93. Let T be a sample from the Gabon-Watson process with 
Er]{a) = 1.1 < A^^. For a node covariate y (gender or nominations), let y' be the 
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corjir, y) 

Bias of sample average 

Crossover point (sample size) 

Gender 

0.11 

0.037 

> 500 

Total nominations 

0.54 

1.1 

20 


Table 1; Gender has a weak correlation with the sample weights. As such, jl has a 
small bias. The crossover point shows that if the samples were drawn independently 
from the distribution tt, then jl has a smaller MSE when n < 500. Total nominations 
has a larger correlation and thus a larger bias. When the sample size exceeds twenty, 
jlipw outperforms the sample average. 


node feature defined in Eq|^in Section 3. Recall that is the bias of the sample 

average. We consider the following methods of generating samples Yi,... ,Yn and 
computing or estimating the variance cr^. 

1 . Yi,...,F„ is an independent and identically distributed sample from the nor¬ 
mal distribution r;ar^(t/')) and = varT^jy'). Here the variance 

is known. 


2 . Yi = y'{Xi) for all i, where Xi,..., is an independent and identically dis¬ 
tributed sample from the equilibrium distribution of P, and = vavT^jy'). Here 
the variance is known. 


3. Yi = y'{Xi) for all i, where Xi,..., X„ is a sample from the (T, P)-walk on G, 
and (T^ is the true variance that is only computable in a simulation, var{^ X]"=i ^*)• 

4. Yi = y'{Xi) for all i, where Xi,..., X„ is a sample from the (T, P)-walk on G, 
and = varT^{y')Gj{R), the estimator discussed in Section!^ 

Eor a sample of size n, let the rejection region be 


W = 



> 1.96}, 

i=l 


with Yi and a to be defined above. Here, the null hypothesis is rejected at a = 0.05. 
Eor each scenario, Eigure|^plots the power of our test pr{W) as a function of sample 
size. The power under scenario (1) can be calculated exactly and serves as a benchmark 
(black line). The power under scenario (2)-(4) is calculated by taking 1000 independent 
samples and counting the number of samples that fall in W (red, blue, yellow lines 
respectively). 

Because scenario (4) underestimates the true variance, this technique is conserva¬ 
tive in rejecting Hq and adopting jj-ipw- For gender, none of the scenarios quickly 
reject the null hypothesis. Compare this to the number of total nominations. Here, Hq 
is rejected even for small sample sizes. 

The final figure plots the mean square error of jl,jljpw, and [Iba', this last esti¬ 
mator is the bias adjusted estimator from Section]^ This simulation uses scenario (4), 
the most realistic of the previous scenarios. After drawing the sample, compute the 
following (for both gender and total nominations) 
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Sample size n 


Sample size n 


(a) Gender (b) Total nominations 

Figure 2: Power of test as a function of sample size for two node features (a) gender 
and (b) total nominations of friends. The black, red, blue and yellow lines are the power 
under scenarios 1-4 respectively. 


1. The inverse probability weighted estimator 


jj-IPW = 


y{^i) 

n ^ Nttx- 

i—1 


1 y{Xi)d 
n “ deg{Xi)' 


2. The sample average A = ^ ^"=1 

3. The bias adjusted estimator 

_ r yiPW {Xi}i<i<n G W, 

A if{^z}l<z<n ^ W", 

introduced in Section 3. 

Figure shows that for gender, the true bias of the sample average is small. As 
such, the MSB of A (solid) is always smaller than that of jlipw (dotted) (for sample 
sizes n < 500). For total nominations, the bias is much larger. So, when n > 20, the 
MSB of jl is larger than the MSB of jlipw- The MSB of the bias adjusted estimator 
jiBA (longdash) lies between that of A and jlipw- In particular, when jlipw drastically 
outperforms A (i-C- on the right of panel b), the null hypothesis is typically rejected and 
dBA performs similarly to jlipw- 


1 Discussion 

A recent review of the RDS literature counted over 460 studies which used RDS 
QWhite et al., 2015] . Many of these studies seek to estimate the prevalence of HIV 
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(a) Gender (b) Total nominations 

Figure 3: The mean square error of jl (solid), fiipw (dotted), and jlsA (longdash) for 
gender and total nominations. 


or other infectious diseases; for these studies, a point estimate of the prevalence is 
insufficient. These studies have used confidence intervals constructed from bootstrap 
procedures and from estimates of the standard error ||Handcock et ah, 2016|| . These 
standard error intervals implicitly rely on a central limit theorem and this paper pro¬ 
vides a partial justification for such techniques, so long as m < l/\\. This paper 
makes a first step at studying the distributional properties of two simple estimators in 
this regime. 

Figuresuggests that if m is larger than 1/A|, then the simple estimators {fi and 
fivn) are no longer normally distributed. Interestingly, under the simulation setting 
where the estimators are no longer normally distributed, the Q-Q plots are flatter than 
the line x = y. This indicates that a conhdence interval constructed from the standard 
errors would be conservative; a nominally 90% conhdence interval would cover y more 
than 90% of the time. As such, a properly constructed interval should be narrower than 
the interval constructed from the standard error. If one pursues this path, then care 
must be taken in estimating the standard errors. For example, a bootstrap procedure 


proposed in ySalganik, 2006 

has become very popular. However, for reasons beyond 

the inequality in Proposition 

5 this bootstrap procedure drastically underestimates the 

actual standard errors |Goel and Salganik, 2010 |Rohe, 2015). 


There are many reasons to suspect the construction of the sampling weights in RDS 
studies. At the most basic level, the justihcation for 


{selection probability for node i} cx deg{i) 

comes from a Markov model which has several problematic pieces (e.g. replacement 
sampling, uniform selection of friends, referral process has reached equilibrium, and 
all network relationships are reciprocated). While these assumptions are all trouble¬ 
some, they are merely sufficient conditions. It is conceivable that deg{i) is still an 
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adequate approximation of the selection probabilities (up to scaling) even when the as¬ 
sumptions do not hold. Perhaps the most difficult problem is that deg{Xi) is estimated 
via self-report. Taken together, there are many reasons to doubt the sampling weights. 
In a related context, the third section of the paper discusses the bias-variance tradeoff 
between fi and jlipw- The results in Proposition and Theorem [^presume that the 
sampling weights are known exactly. However, given that the presumed model has 
several deficiencies, the stationary distribution of the (presumed) Markov process does 
not necessarily reveal the actual sampling probabilities. As such, fiipw (and by ex¬ 
tension fivn) are constructed with incorrect and noisy measurements of the sampling 
probabilities. This will likely make the estimator biased and more variable. Because of 
this, in practice we should be less inclined towards the weighted estimator (i.e. jlvn) 
than the proposed estimator fipA suggests. Section]^ and the data analysis with the 
AddHealth network suggest that the sample average is perhaps less biased than was 
previously considered. While there are certainly situations where bias corrected esti¬ 
mators should be used, it also seems sensible to hrst estimate the bias; when the bias is 
large, this is a relatively easy task. 

The theorems in this paper do not apply to general trees, only to m-trees. If T is a 
Gabon-Watson tree with E{r]{a)) < then the simulations support the following 
conjecture: 



where tr^ could be computed from the results in URohe, 2015|| . To prove this result 
requbes a more careful study of the structure of We leave this problem to 

future investigation. 
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A Proof of Theorem [U 


In the appendix we give a proof of the theorems and propositions in the paper. First we 
give an outline of the proof of our main theorem. Consider the martingale 


Y,E{Yh\^h-i)-Yh 


h 


where {^h} is a filtration to be dehned later. Using the Markov property and the 
estimation of var{Yh), we show that the martingale difference sequence satishes the 
condition of the martingale central limit theorem. In this section, P will be a reversible 
transition matrix with eigenvalues 1 = Ai > IA 2 I > ... > \^n\ and corresponding 
eigenfunctions /i,...,/jv satisfying = Sij for any i,j. We refer 
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to HLevin et al., 2009| for the existence of such eigen-decomposition. Unless stated 
otherwise, expectations are calculated with respect to the tree indexed random walk on 
the graph. 

We begin with some lemma. 

Lemma 1. (Lemma 12.2 in ([Levin et al., 200 ^ ) Let P be a reversible Markov tran¬ 
sition matrix on the nodes in G with respect to the stationary distribution tt. The 
eigenvectors of P, denoted as fi, ..., Jn, are real valued functions of the nodes i G G 
and orthonormal with respect to the inner product 

{fa, fb)^ = X] fa{i)fb{i)TTi- (5) 

jGG 

If X is an eigenvalue of P, then |A| < 1. The eigenfunction fi corresponding to 
the eigenvalue 1 is taken to be taken to be the constant vector 1. If X{0),..., X{t) 
represent t steps of a Markov chain with transition matrix P, then the probability of a 
transition from i £ G to j G G in t steps can be written as 

N 

P{X{t) = j\X{0) =i)= P*J = TTj -f TTj ^ X\fi{i)fi{j). (6) 


Lemma 2. For any nodes a, r in T, 


N 

cov{y{Xa),y{Xr)) = ^ Af< y, fe >1, 

1^2 

where < y, ft >^= y{^)fe{f)'^i- 

Proof. From the reversibility of the Markov chain and Lemma we have 

N 

P{X, = j\Xr = t) = PfF^^ = n, + TT, ^ xf^’^^Mi)fdj)- 

1^2 

Therefore, 

cov{y{X„),y{Xr)) = ^ = j\Xr = i) - {'FT^zVid 

i 

N 

ij 1^2 

N 

= Y^Xp’'> <y. I, >1, 

1=2 


and the lemma is proved. 

The next lemma gives the expression of var(Yh). 


□ 
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Lemma 3 (Variance of Y/i). Suppose that IA 2 I >0. Then as h ^ 00 , 


( 0(1) if m < A 2 ^ 

var{Yh) = < 0{h) if m = 

y o\{m\l)^) if m> 


Proof For fc = 0,1) h, denote by Shk the number of ordered pairs {a, t) such that 
\a\ = \t\ = h and d{a, r) = 2k. Then Sho = rnf, and 

Shk = f 


for k > 1. By Lemmaj^ 

h N N h 

var{Yh) = ^ \f < y, ft >1) = Yi< V’ h >^ (^ + H “ l)^f)) 

1^2 1^2 k^O 

N h 

1=2 k=0 

Thus 

( 0(1) if m < Xf^ 

var{Yh) = < 0{h) if m = Xf^ . 

(_ 0{{mX\)^) if m> Xf^ 


□ 


Corollary 2. For any function y on the state space, -^^=Yh —>■ 0 in probability. 
Proof. It follows from Lemma that 


var{ 



o{xf) ^ 0. 


□ 


The next lemma is a convergence argument which we will use in the proof of The¬ 
orem [T] 

Lemma 4 (Slutsky’s lemma). If Xh —>■ X in distribution and Y(i —>■ 0 in probability, 
then Xh + Yh ^ X in distribution. 

The following theorem from [Durrett, 2010[ is essential to the proof of our main 
theorem. 

Theorem 4 (Martingale central limit theorem). Suppose that {Zh}h>i adapted to 
the filtration {,^h}h>i ond that E{Zh+i\,^h) = Q for all h > \. Let Sh = 'Yi=i 

and Vh = E*ti E{Z^\^,_f). If 

(1) Vh / h ^ > 0 in probability and 

(2) h-^ J2i=i ^i^?^{\z,\>eVh}) ^ for every e > 0, 

then Sh/sfh N{0 , ) in distribution. 
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Now we are ready to prove our main theorems. 

Proof of Theorem^ Define Yh in the same way as Theorem[T] Without loss of gen¬ 
erality, suppose that = 0. Since m < y/mP — / is invertible. Let 

y' = {^/mP — I)~^y. Then y' is also a function on the state space. We will first 
argue on the new node feature y' and then convert back to y. Define 





E 

T^T:\T\—h 


Let Zhk = J2a:\,y\=h^{x„=k} for h > Oand/c = 1,...,7V. Define = {zhi, ■■■, ZhN}, 
and 

■^h = : | t | < h) 

fork > 1. It is obvious that {Yh}h>i is adapted to the filtration {^h}h>i- Let 

Z^ = E{Y;^\^u-i)-Yi 


Then {Zh, is a martingale difference sequence. We will verify that {Zh: :^h}h>i 

satisfies (1) and (2) in Theorem]^ 

We have 


Zh = mzh-iPy'/Vm'^ - = 


zl_fy/^P)y' zly' 


y/mL i 


'm'‘ 


For any a gT, denote by p{(j) the parent node of a. can also be expressed as 

z,.= y: i: hiiZiLfSM, y: 

1 / /w-k n. ^ / /w-k n. -I / iL 


(T:\(T\ — h 


(T:\cr\—h 


(T:\(7\—h 


where 1^(0-) is the 1 x vector with lp(a-),i = 1 if ^p{a) = ^ 0 otherwise. 

We have 

E{W,\^h.i)=0 


and 


E{W^\^k-i) 


VaTp, {y') 
mf 


for i = 2fp(o.), where pi is the ith row of the transition matrix P and Vavpfy') = 
PijiyU) ~ 'ThjPij'yij))'^- From the definition of tree indexed Markov process, if 
\a\ = | t | = h, then Wa,Wr are independent given {X^ : \(j\ = h — 1}. Using 
E{Wa\^h-i) = 0, we have 


N 

E{Zl\^n-i)= E = 

a-.\a\=h i—1 
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From Corollaryj^ var{^) = 0(A|^) —>■ 0 for every i. Thus var{E{Z‘^\^fi-i)) = 
0(A2^) — 0{h —>• oo) and var{E{Zf\^i-i)) converges. It follows from the 
definition of 14 and the Cauchy-Schwarz inequality that 


lim variVh/h) = 

h—>(x> 


lim var 

h—>(x> 


(4 E E{Z!\^^-i)) < ^Ihn ^^var{E{Zf\^,_,)) = 0 . 

i=l °° i=l 


Therefore 


in probability, where 


Vh/h —> cr^ 


N N N N 

= E{E{Zl\^h-i)) = ^'niVarp^{y') = ^7ri(^p,j2/'(j)2 - C^p^,y'{j)f 
i—1 i=l j—^ J —1 

= var^{y') - var^{Py'), 


and condition (1) in Theorem|^is satished. 

Similarly, we have 

E{Zf^\^n-i)= E E{W^\^h-i)+ E E{W^W^\^h-i) 

a-.\a\ = h ct , t :|( t | = | t|=/i 

^ Co m"(m" - 1) / r 
S —r + ^ - E 


m'‘ 


n2h 


where Co, Ci, C are constants. Thus E{Z^) < C for any h, and 




{\Xi\>eVh} 








1 ^ C 

^E(X‘)<2^0. 


2=1 2=1 
Condition (2) is also satisfied. From Theorem]^ we have 


2=1 


1 

Vh j 


h 1 ^ 

E^. = 4E( 


zJ_^{y/mP)y' zjy'. 


2 = 1 


y/h '/mF^ 


N{0,F) 


/m‘ 


in distribution. If m < A 2 then from Lemma -^ 0 probability. From 

Lemma 1^ and the definition of y'. 


^ i=i 


in distribution, where F — uar^(?/') — varT^{Py') = var^{{FmP — I) ^y) — 
varTr{P{y/niP — I)~^y). The proof is now complete. 

□ 
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B Proof of Theorem and Corollary [I] 

We provide a proof of the central limit theorem using the moment method. It involves 
a careful study of all the moments of fih- The following proposition is essential to our 
proof. 

Proposition 6. (Moment continuity theorem) Let be a sequence of uniformly sub- 
gaussian real random variables, and let X be another subgaussian random variable. 
Then the following statements are equivalent: 

(1) EXl EX’^ for all k 

(2) Xh —> X in distribution 

Following this proposition, we can break down our proof to two parts. We will first 
prove that all the moments of jj,h converge to the moments of some normal distribution. 
Then we will verify that /i/j is a uniformly subgaussian sequence. 

B.l Proof of moments convergence 

Let Xr be the root of the 2-tree. Define 


lk,h{i) = E[fi'^^\Xr = i], 


and 

"fk,h = Elftl], 

Let p = •\/2|A2| < 1. We will prove that there exist 7fc, fc > 1 such that |7fc^;i(i)—7fc| = 
O(p^) for all k, i, h and that 7^ = E{^'^) for ^ ^ iV(0,72). Our key observation is that 
the left and right subtree can be seen as i.i.d copies of the whole tree given the left and 
right child of the seed, which makes it possible to build a relationship between jk.hii) 
and 7fe,/j_i(t). Only condition (c3) is needed throughout the proof. 

We need the following Lemma. 

Lemma 5. Let {o/i} be a sequence satisfying 

ah = Ch{cah-i + C + dh), 

where \1 — Ch\ = 0{p^), \dh\ = O(p^), C is a constant and c < p < 1. Then 
\ah-C/(l-c)\ = 0(p^). 

Proof Without loss of generality, suppose that Ch 0 and (7 = 0. Since 11 — c^, | = 
0{p^) and \dh\ = 0{p^), there exists M such that I'^^l — 

for all h. Therefore, 

\ah\ = y^c^-'^dk n Ic*l< -P". 

k—O i—h—k 

and the lemma is proved. □ 
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We use an induction on k. First, we will prove that 71 = 0. In fact, from Lemma[^ 

N 

E{y{X,)\Xr = i) =Y,yU)Pl;' = 0(|A2|I"I). 

i=i 

Therefore 

E{yf,\Xr = z) = ^ ^ 2 '= 0 (|A 27 = 0{p'^) 

V 2 

for all i, and | 7 i,/i( 2 ) — 7 i | = 0{p^) for 71 = 0 . 

Now we move from k — ltok. Without loss of generality, suppose that 72, /t(*) > 1 
for all h,i (or we can multiply y with a large constant). It follows that ^ 2 k,h{i) > 
(72, > 1 for all k. We can decompose jk,h{'i) into 

7 fe,„(*)=i?[ 7 |X, = *]-i?[(^ E X,)'^\Xr=z] 

v 2 CTeT,o<|<T|<ti 

+ E[i^ E X,)>^\Xr=t] 

v2 (TgT,0<|<T|</l 

:= h + h 

If k is even, then from Holder’s inequality we know that 

|/il =|it;[A^|X, = *] - E[ipk - = *]| 

= E = 

m=l V™/ (8) 

<[(1 + MV2”'‘)'= - l]E[fii\Xr = i]. 

Likewise, If k is odd, we have 

|/l| =E[p^,\Xr = l] - E[{p, - = z] 

V2^ (9) 

<[(i + mV2~'")'^ - = i] 

Since k is fixed, E[jj,^~^\Xr = i] is bounded from our assumption on ^k-i,h{i), and 
(1 + - 1 = 0(72"'“) = 0{p^). Hence, 

|/i| < [{l + MVT''f-l]E[pl-^\Xr = i] = 0{p'^). 
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Let Xic and Xrc be the left and right child of the root and T; and T,. the left and 
right subtree, we have 

^ V2 a^fr.,\a\<h V2 o-eT,,|cr|</i 

^ ^ PiuPiv^ii p- ( ^ ^ ^cr ~t” ^ ^ ^a-)) l^r L ^Ic ^rc tj) 

V2 crGT,.,|CT|<ti V2 CTGTi,|o-|<ti 

= ', PiuPivE{{—j={ 'y ' ^CT H ^ ^ ^cr)) l^lc = U, Xrc = v) 

u,v V2 aefr,W\<h V2 crGTi,|o-|</i 

= y^PmP™^-7:(7/c.?i-i(M) +7fc./i-l(t')) + Si 

= \_^ ^Ptulk,h-l{u) + 


where 


Si=y^ —^ ) y^^PiuPivl7n,h-l{u)^k-ni,h-l{v). 

./o \mJ 

m = 1 V ^ 7/ -7) 


^^172 

If fc = 2, Equation|^and[^reduce to 

7/c.ft(*) = '^Piujk,h-i{u) + Shii), 


( 10 ) 


where 


Sh{i) = 


y{if 2y{i) 


+ {^P^ull,h(u)f = 0 {p^). 


2" V 2 '^ 

Write Vh = {llY and 5h = {5h}'■ For n>2, 

Vh = Pvh-i + 5h. 

Thus by setting (5i = 0 we have Vh = +X)/c=i F’^^ft,-fc,and it is not hard to verify 

that all the components of Vh (i.e., every 'yk,hi‘i)') converge to 72 = ttVi + 
with rate . 

Now suppose that k > 2. There are a fixed number of terms in ^i. Since |7/,/i(j) — 
7i I = 0{p^) for all i G S' and Z < fc — 1, we have 


fc-i 


1 ^ 1 -E 


1 fk 




^m'yk—m\ — )■ 


Thus, 


k-l 


h = EF™7fc./*-iH + E ^ ^ 

v 2 u m=l v 2 


lmlk-m+ 0 {p^). (11) 
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Combining Equation [ 8 |[^ and pT|we arrive at the final equation for 


k-l 


— Ck,h^2 — C^k,h{ 


V 2 ‘ 


k-2 Z_^ 


Piu'Jk 








( 12 ) 

where Ck.h = 1 if A: is odd and Ck,h = 1 + 0 {p^) G [2 — (1 + My /2 (1 + 
MV2~ )'=] if k is even. Since ^k -2 < P and Y[i=i Ck,h converges, we conclude 
from Lemma|3that 

l7fe,ft(*) - 7fc| = 0(p'*), 


where 7 fc = ^(^) 7 m 7 fc-m/(l - ^f^)- 

We have proved that \ jk,hi'i) — 7fc| = 0{p^) for all k, i, h. Let h tend to infinity in 
Equation we have 


Ik = 



(13) 


Now suppose that /i is a sequence of i.i.d A^(0, 72 ) variables. Let 


Ik = 


lim E(( 

h—¥oo 


'Cl + ■ ■ • + 

y/h 




then { 7 fe}, fc S N also follows Equation [Ts] Since 71 = 71 = 0 and 72 = 72 we have 
Ik = 7fe for every k, and the argument is proved. 


B.2 Proof of uniform subgaussianity 

To prove that ph are uniformly subgaussian for all h, we need to show that there exists 
some 9 such that 

for all i. 

Let Cl be a large constant to be defined later and Ch+i = (1 + M{ 2 X' 2)~^){1 + 
{y/2\'2)^)ch, where X'^ = max{|A 2 |, 2/3} and M = ||j/||oo- Let = \\li,h{i)\\cx>- 
Since 0 < y/2X'2 < 1 < 2 A 2 , C/ 1+1 > c/i and 9 = lim;i_ioo Ch exists. Thus it suffices 
to prove that 

S2t,h < cj/l2t (14) 

and 

S2t-l,h < (/h ^{'^^’ 2)^121 (15) 

for all i and h. 

Again we use an induction on 1. Since 7 i,/i(z) = 0 (|A 2 |^), we can choose ci large 
enough such that the inequalities in Equation [l4| and [Ts] hold for all {h,tj with h = 1 
or £ = 1. Suppose thatandare verified for all i < k. We will prove that they are 
also true for i = k + 1. 
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From condition (cl) and (c2), we know that 


\'^^Piul2k+l,h{u)\\oo < A2||72fc+l,?i(0ll oo — ^2S2k+l,h- 


From our assumption of induction we have 


^ I If-/o“^'|2fc+l -|l 2fc I 2fc+l (1^2^21)^ ,/2fc+ 1 

S 2 /c+l,?i < [(1 + AZ V2 ) - lJC/t-l72/c + C;j_i - nfc-l -2 n 


V2- 


+ E( 


m—l 


2k+ 1 

2m — 1 


+ 


2fc + l 
2m 


)l2k+2-2ml2 71 


2fc + 1 
2fc + 1 


0 

72fe+2) 


72fc+2 


= [(1 + My/2 - l]c^^i72fc + cf+/ 

{\V2\'/\f ^(2k + 2'' 

0 


72- 


-2k-\-2 


l2k+2 + ^ 


m—l 


2k+ 2 


2m 


72fc+2-2m72m + 


2k+ 2 
2k+ 2 


72fe+2) 


= [(1 + MV2~Y^^ - 1]4-i72Zc + cf_V(|V2A'|)S2fc+2 

< ([(1 + - 1](|V2A'1)-^ + l)cf:V(|y2A'|)S2fe+2 

< (1 + M(2A')-")2'=+icf_V(|V2A'|)S2fc+2 

<cf+i(|y2A'|)S2Zc+2, 

and Equation[^is true for 2k + 1. 

Now we move from 2fc + 1 to 2fc + 2. Recall that 


(16) 


2fe+2 


1 f2k + 2 

m 


l2k+2M{i) = C2k+2,h{ ^ _2/c+2 

m^O V2 

2fe+2 - /piU I p 

<(l + M2-‘)“+f^—55^' 

m=0 V2 


Xlp2KP2,;7m./«-l(M)72fc+2 — 771,/l—1 (^')) 
U,V 

Sm,h—lS2k-\-2—m,h—l)- 


Thus, 


2fe+2 


s2fc+2,. < (1 + M2-^r+^ij;^ —4^ ^ 


Let 


fc+i 




^ /o 
1=0 V ^ 


1 /2k + 2 


m 


■SttI,/!— l'52fc + 2 —771,/l—l)- 


and 


=oy2"''^"V 2m 
1 /2k+ 2 

^0 V2^ + 1 

Then S 2 k+ 2 ,h < (1 + M2-^f'^+^{h + / 2 ). 


^^=E 


S2m,h-lS2k+2-2m,h-l 


S2m+l./i-lS2fc+l-2m,Zi-l- 
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We have 


fc+i 


1 f2k + 2 

V 2to 

m=0 ^ 

k+1 


2m 2fe+2-2m„, 

C/i_i72mC^_l 72fc+2-2m 


_ 2fc+2 


1 (2k + 2 

^ V 2m 

m=0 ^ 


( 17 ) 


72m72fc+2-2m 


— C^h-il2k+2, 


where the last equality follows from Equation[^ On the other hand 


1 


E /«2fe+2 t 2m + 1 

m—0 V ^ 


2fc + 2 


= (^A'^ 


2m+l/ /7\\f\h — l 2fc+l —2m/ 1 

C^_l (V2A2) 72m+2C;,_i (V2A2) 72fc+2-2r7 

1 f2k + 2\ 

I 72m+272fc+2-2m- 

(18) 


=0 V2m + 1 


It can be directly verified that for all m, 


(2k+ 2 
\2m + 1 


72m+272fe+2-2m < 2(/c + 1) 72fc+2- 

' ml 


Thus, 

h < cf_7(v^A'^ ^i+2 2(fc + 1) (^)72fe+2 
m^o v2 A / 

= cf_7(y2A'2)2(''-i)(fc + l)72fe+2 

< cf_7(y2A'2)'‘2fc72fc+2. 

Combining Equation and we have 

/i + /2 < cf_+ 72 fc+ 2 (l + 2fc * (y2A^)'‘) 

< cf_772fc+2(l + {V2>!^tf’^+^. 


(19) 


( 20 ) 


Therefore, 


Shak+2 < C2k+2,h{h + h) 

< 47^(1 + M2-'')2'=+2(1 + (V2A'2)'‘)2'=+72fc+2 (21) 

^2fc+2 

S C/j 72/C+2, 

and the theorem is proved. 


B.3 Proof of Corollary!^ 

Proof. By Theorem]^ and Slutsky’s lemma, it suffices to prove that d ^ din proba¬ 
bility. Let D = maxi<i<Ndeg{Xi). Eor any a G T, Et^ ^ = i. Thus Ek = A, 
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and it follows from Theorem that d ^ dm probability. Since d,d < D, we have 
Pi\d — dl > e) < P(| 4 — 4| > 0 for all e > 0, and the corollary is proved. 

d “ 

□ 


C Proof of Proposition and Theorem 


Proof of Proposition^ Since j/(l),..., y{N) are uncorrelated, we have 

^2 if 

P2 if i= j 

Therefore, 


EiyiX,)y{X,)) = / Ei=i if * J 


/.X y 2 ni n-i 
var(jl) = -^- 


N 


N 


n n n 


(M2 (1 - XI 


Similarly, we have 


^ 1 , n-l y{Xi)y{X2)^ 

var{np„) = -v^ri — ) + _e„(——) 

^ 1 2 1 1 

M2 1 MI , ^ - 1/M2 , 2/1 In 2n 

= — -+ —+Mi(i - - Ml)- 

n ^—' X^TTi n n N w 


Let 


VD = var{flipw) — var{jl) 

AT 1 1 ^ 

i—1 i- 

Since maxi<i<NNTri = Ci, N'^Tii'Kj < Cf for all i,j. Therefore 

Ev4^-i = IEE( “ 


( 22 ) 


i=l 


N ^ N N , - ,— ^ N N , 12 

1 - 1 n 2 _ 1 (TTi - TTj) 


2=1 


We have 

N N 


2^^ N’^TTiTTj 


= .EE 

i=l j=l 


N N 


(23) 


1 2=1 j = l 


^ ^ N N N N N 

P E E = P E E(^2'+-EE = ^E ^2 -1 ^^4) 


i=ij=i 


2=1i=i 


2=1 j = l 


= N’^variir). 


From Eq|^ and 


> (^ 7 ^-- Nvar{y))var{Tr), 

C{n 


and the proposition is proved. 


□ 
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Proof of Theorem^ For the (T, P)-walk on G, the variance of jlipw jl is given 
by 


N 


'(A,pw) = + '"‘’■(y) Y. 

n N^TTi n ^ 

i—\ (7^T i—'\. 


N d(cr,r) 

— P^i 


Nh 


and 


N 


"(A) = — —^ ^ X! 

n n •“ 

<7^T i—1 


d(<7,r) 


Let 


VD = var{fiipw) — var{(i) 


i—1 cr^T i—1 

Since CiN < di < C 2 N, we have tt^ < for all i and for all i,j, t. 

Thus, 


EEC 


N^TTi 




a^T i—1 

From equation]^ andwe know that 


N 


N d{<7,T) . IV ^ 




a^Ti—1 


NCl 


N 

E 


1 

iv^ 




— 1 > ^ ^ varij:). 

Co 


Therefore, 




We conclude the proof by taking C = 


ct 

CaCf ■ 


□ 


D Proofs of Propositions and 

Before the proofs, some more notation is necessary. Let |Ai| > IA 2 I > • • • > |Ajv| 
denote the eigenvalues of P. Denote the fi,..., : V ^ M. as the corresponding the 
eigenfunctions of P. 

The following is a proof of Proposition]^ 

Proof. Proposition 1 in [Khabbazian et ah, 2016| says that 

N 

Cov{y{Xp^.r)),y{Xr)) ='^{y,fe)lXe. 

e^2 
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Thus, from the lemma and the definition of R, R is a convex combination of the eigen¬ 
values 

p _ {y^ fdl •> 

Lemma 6. 

fe)l ^ ^ 

^ var^iy) 

The proof of Lemmaj^is given on page 342 of [Levin et al., 2009| and is repeated 
below for completeness. 

From Theorem 1 in [Rohe, 2015[ , 

N 

= Y.^y,h?MX,)- ( 26 ) 

^=2 


Applying Jensen’s inequality yields 


N 

al = var^{y)Y^ 

i^2 


{y, fi)l 

var^{y) 


G{Xe) > varT,{y)G{R). 


(27) 


□ 


The proof of Proposition]^ is similar. 

Proof. In the case when y{i) = y, + af{i), this implies that y = yfi -f afj. By 
the orthonormality of the eigenvectors, “ ^"tJ “ ^ ^ such, the 

inequality in equation ( jZT] ) holds with equality. □ 

The following is a proof of Lemma 
Proof. 

N N 

var^iy) = E^{f) - {E^yf = '^{y, fj)l - {E^yf = fj)l- 


Proposition presumes that G is convex. Figure plots G for twenty differ¬ 
ent Galton-Watson trees with offspring distribution p(0) = .l,p(l) = .l,p(2) = 
.3,p(3) = .5. This offspring distribution has expected value 2.2. The construction 
of each tree was stopped when it reached 5000 nodes; if it failed to reach 5000 nodes, 
then the process was started over. In these simulations and in others not shown, G is 
often convex. When it is not convex, its second derivative is positive away from —1. 
This simulation was selected because it shows that even when the trees are sampled 
from the same distribution, even when there is nothing strange about the offspring dis¬ 
tribution (e.g. all moments are finite), even when it is a very big tree, even under all of 
these nice conditions some of the trees have a convex G and some of the trees have a 
non-convex G. Similar results hold when the trees have 500 nodes; the only thing that 
changes is that the red regions extend slightly further away from —1. 
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Some G functions are not convex. 
However, they are all convex away from -1. 



Figure 4; Each line corresponds to a function G for a random Galton-Watson tree. 
Some of the curves are not convex; the regions of these functions which have a negative 
second derivative are highlighted in red. While some of the black lines appear to have 
a negative second derivative, they do not; this illusion is due to the log transformation 
on the vertical axis. 
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